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attract  This  report  studies  reliability  properties  of  store-and-  j 

1  forward  networks,  analysis  of  network  reliability  and  algor  it  .tv.  s  tor 
minimum  spanning  trees.  A  study  of  the  ^rade-offs  between  network  size, 
connectivity,  and  cor.ipor.cnt  reliability  shows  that  large  networks  re¬ 
liability  will  be  a  major,  and  perhaps  dominant,  design  problem. 
•Recursive  analysis  techniques  for  loop  and  tree  combine tio..s  greatly 
•reduce  analysis  cent,  while  improved  methods  for  generating  minimum 
spanning  trees  have  a  similar  effect  for  this  fundamental  network 
problem . 
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SUMMARY 


Technical  Problem 

The  Network  Analysis  Corporation  contract  with  the  Advanced 
Research  Projects  Agency  incorporates  the  following  objectives: 

To  determine  the  most  economical  configurations  for  the  ARPANET, 
to  study  the  properties  of  store  and  forward  networks  and  to  de¬ 
velop  procedures  for  analysis  and  design  of  reliable  computer 
communication  networks. 

General  Methodology 

The  heart  of  the  research  program  has  been  a  dual  attack  on 
basic  network  theoretical  problems  and  the  development  of  compu¬ 
tational  techniques  for  the  study  of  large  networks. 

Technical  Results 

Some  of  the  results  accomplished  during  the  reporting  period 

are: 

•  A  study  of  the  tradeoffs  between  network  size,  network 
connectivity  and  component  reliability  was  completed. 

This  study  indicates  that  reliability  will  be  a  major 
and  perhaps  dominant  issue  for  large  network  design. 
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•  A  new  method  for  reliability  analysis  which  uses  a 
recursive  technique  has  been  developed  to  handle  a 
large  class  of  networks  composed  cf  loops  and  trees ^ 
This  method  allows  a  wide  variety  of  reliability 
criteria  to  be  evaluated  simultaneously  at  a  small 
fraction  of  the  c\>st  of  previously  known  methods. 

•  New  and  improved  computational  techniques  for  finding 
’’minimum  spanning  trees”,  (a  fundamental  network 
problem)  were  derived.  This  computation  is  a  basic 
ingredient  in  many  large  scale  network  algorithms. 

Department  of  Defense  Implications 

Communication  networks  for  meeting  Department  of  Defense 
requirements  involve  huge  network  structures  that  present  tech¬ 
niques  are  inadequate  to  handle.  The  results  of  the  reporting 
period  highlight  the  role  that  reliability  will  play  in  such 
networks,  provide  new  techniques  fCi.  che  analysis  of  large 
Defense  Department  networks  and  meet  some  of  the  computational 
requirements  for  .Large  scale  network  design. 

Implication s  for  Further  Research 

This  report  shows  that  for  very  large  networks,  cost/ 
reliability  considerations  must  be  given  equal  importance  to 
cost/ throughput  considerations.  Thi&  means  that  there  will  be 
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a  need  to  develop  dramatically  different 
tc  insure  availability  of  resources  in  » 
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large  network.  The  re¬ 
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I.  RELIABILITY  AND  LARGE  COMPUTER  NETWORKS 


1.  Introduction  and  Summary 

The  major  considerations  in  the  system  design  of  a  computer 
network  such  as  the  ARPANET  are: 

1)  Cost 

2)  Throughput 

3)  Delay  and  response  time 

4)  Network  reliability 

While  it  is  essential  to  consider  each  of  these  constraints, 
it  often  results  that  several  are  automatically  satisfied  for 
designs  satisfying  the  remaining.  Initially,  this  was  the 
case  for  the  ARPANET.  The  delay  and  response  t ime  was  ade¬ 
quately  considered  by  slightly  derating  the  line  capacities 
of  the  50  kilobit  links  and  the  reliability  was  adequate  if 
there  were  at  least  two  node  disjoint  paths  between  each  pair 
of  nodes.  Thus,  the  cost-throughput  tradeoff  was  the  over¬ 
riding  consideration.  Given  these  conditions,  it  is  possible 
to  design  very  efficient  networks  in  a  reasonable  amount  of 
compucing  time.  However,  it  is  becoming  evident  that  as  the 
ARPANET  increases  in  size,  the  reliability  constraints  are 
beginning  to  limit  design  choices.  It  may  even  become  that 
the  cost  -reliability  tradeoff  may  replace  the  cost- throughput 

r> 


tradeoff  as  the  basic  design  consideration.  While  for  small 
versions  of  the  ARPANET,  any  design  with  at  least  two  node 
disjoint  paths  between  each  node  pair  and  sufficient  through¬ 
put  would  necessarily  be  reliable  enough,  initial  investigations 
indicate  that  for  large  networks  sufficient  reliability  auto¬ 
matically  implies  sufficient  throughput.  In  any  case,  it  is 

clear  that  reliability  constraints  will  play  an  ever  increas- 

% 

ing  role  in  the  design  process  as  the  ARPANET  becomes  larger. 
Considering  this,  it  is  quite  sobering  to  note  that  many 
large  communication  networks  are  being  designed 
with  little  consideration  of  network  reliability  (as  distin¬ 
guished  from  component  or  element  reliability) . 

Reliability  analysis  of  computer  networks  is  concerned 
with  the  dependence  of  the  reliability  of  the  network  on  the  , 
reliability  of  its  nodes  and  links.  Element  reliability  is 
easily  defined  as,  for  example,  the  fraction  of  time  the 
element  is  operable,  or  as  by  the  mean  time  between  failures 
and  expected  repair  time.  The  proper  measure  of  network  re¬ 
liability  is  nor  as  clear  and  simple.  Several  possible 
measures  are:  the  number  of  elements  which  must  be  removed 
to  disconnect  the  network,  the  probability  that  the  network 
will  be  disconnected,  the  expected  fraction  of  node  pairs 
which  can  communicate  through  the  network,  and  the  expected 
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throughput  of  the  network  subject  to  element  failures.  The  above 

measures  are  listed  in  order  of  their  computational  complexity. 

% 

Many  other  measures  can  and  have  been  suggested.  A  whole  other 
class  of  measures  arise  when  the  nodes  are  not  of  equal  importance, 
as  in  centralized  networks  or  hierarchal  networks.  In  a  centralized 
network,  one  may  be  interested  in  the  expected  number  of  nodes  which 
can  communicate  with  a  central  node.  More  general  criteria  arise 
when  different  node  pairs  are  weighted  by  their  importance.  For 
example,  communication  between  ILLIAC  IV  and’  certain  other  nodes 
will  be  of  high  priority  in  the  ARPANET.  Most  of  our  analysis 
will  deal  with  exepcted  fraction  of  node  pairs  communicating  al¬ 
though  in  many  cases  any  of  the  other  criteria  mentioned  could 
be  used. 

Node  failures  can  affect  network  reliability  in  two  ways. 
First,  if  a  node  fails,  clearly  it  cannot  communicate  with  any 
other  iiode  in  the  network.  Thus,  if  there  are  NN  nodes  in  the 
network  and  one  fails,  a  minimum  of  NN-1  node  pairs  cannot  com¬ 
municate  independent  of  the  network  structure.  In  the  next 
section  we  establish  a  simple  formula  for  measuring  this  effect. 
Changing  the  network  configuration  ha3  no  effect  on  this  com¬ 
ponent  of  network  reliability.  Another  effect  of  node  failures 
is  that  the  failed  les  destroy  some  potential  communication 
paths  between  other  pairs  of  nodes.  Link  failures  also  affect 
network  reliability  in  the  second  way. 


In  the  next  section,  we  survey  the  reliability  situation 
for  small  versions  of  the  ARPANET.  In  Section  3  we  enumerate 
several  independent  pieces  of  evidence  which  point  out  the  in¬ 
creasing  role  of  reliability  considerations  in  larger  ARPANETS. 

In  the  final  section  of  the  chapter,  the  implications  of  this 
trend  are  discussed. 

2 .  Reliability  of  Small  to  Medium  Networks  (NN^50) 

The  initial  design  procedure  for  the  ARPANET  controlled 
reliability  by  ins isting  that  there  be  at  least  two  node  dis¬ 
joint  paths  between  evnry  pair  of  nodes.  Later  computations 
proved  that  this  implied  almost  perfect  reliability  in  the  fol¬ 
lowing  sense.  Suppose  node  i  in  the  network  is  inoperative  a 

fraction  p.  of  the  tim*  for  i=i,...,  NN.  Then  a  lower  bound  for 
1 

the  expected  number  cf  node  pairs  which  cannot  communicate  is 
equal  to  the  expected  number  of  node  pairs  not  communicating  in  a 
complete  network  where*  each  node  pair  is  joined  by  an  invulnerable 
link.  No  addition  or  redistribution  of  links  can  xeduc^  the  ex¬ 
pected  number  of  node  pairs  not  communicating  below  this  value. 

For  small  nets,  the  existence  of  two  node  disjoint  paths  between 
each  pair  cf  nodes  invariably  resulted  in  an  expected  number  of 
node  pairs  not  communicating  very  near  the  lower  bound.  Thu"', 
the  addition  of  more  links  for  reliability  purposes  was  not  justi¬ 
fied.  The  calculation  of  this  important  lower  bound  is  as  follows 


5 


9 


Let  each  node  i  of  a  network  with  NN  nodes  have  a  probability 
Pi  of  failing.  Then,  the  expected  number  of  node  pairs  in 
which  one  or  both  nodes  have  failed  is 

If  Pj_  =  p  for  i=l,.,  NN,  i<  j  then  the  expected  number  is 
NN(MN-l)  [l-(l-p)2J  =  NN(NK-l) [2p(l-p)l 

and  the  expected  fraction  of  node  pairs  with  at  least  one  node 
failed  is  [2p(l-p)].  Two  important  implications  of  this  simple 
result  deserve  to  be  emphasized.  First,  the  expected  fraction 
of  non-communicating  node  pairs  cannot  be  reduced  below 
[2p (1-pJ ] ,  and  second  this  lower  bound  is  invariant  with 
respect  to  the  size  or  the  network. 

To  fix  these  ideas  and  to  give  specif c  examples  of  the  * 

reliability  characteristics  of  small  nets,  we  consider  two 

* 

versions  of  the  ARPANET.  The  first  is  a  23  node  network  that 
has  been  thoroughly  analyzed  as  a  cmmon  measuring  point  or 
standard  for  the  various  reliability  analysis  techniques. 

The  second  network  is  a  medium  size  network  of  33  nodes  in 
which  for  the  first  time  an  additional  link  was  considered 
mainly  for  reliability  xeasons.  Th6  13  node  network  is  repre¬ 
sented  in  Figure  1.1.  This  design  had  a  yearly  line  cost  of 
$847,000  for  its  28  lines  and  a  throughput  of  9.9  Kbits/node 

io 


assuming  uniform  traffic  between  nodes.  We  will  assume  a 
base  element  failure  probability  0.02  which  is  a  close 
approximation  to  currently  measured  values.  Then,  2p(l-p) 
equals  0.0395  for  p=.02  and  hence  the  expected  fraction  of 
node  pairs  not  communicating  must  be  at  least  (.0396)  (23)  (22)/2 
equals  10.0188.  In  Figure  i.2  the  expected  fraction  of  node 
pairs  not  communicating  as  a  function  of  element  failure 
probability  is  shown.  Also  shown  is  the  expected  fraction 
of  node  pairs  not  communicating  when  only  links  fail,  when  only 
nodes  fail  and  finally  when  the  curve  2p(l-p)  is  plotted. 

For  p  =  .02,  the  expected  fraction  of  node  pairs  not  communi¬ 
cating  is  0.04S;  * 

In  t>e  case  where  only  nodes  fail  the  expected  fraction 
is  .0427  and  for  only  links  failing  .0018.  Remcmher inc  that  . 

2p (1-p)  =  .0396,  we  see  that  80%  of  the  node  pairs  which  cannot 
communicate  can  be  ascribed  to  purely  the  fact  that  one  of 
the  nodes  of  the  pair  in  question  has  failed.  Thus,  the 
improvement  in  reliability  to  be  gained,  by  changing  the  network 
configuration  is  minor.  Nevertheless,  ceveral  strateg.es  for 
improving  reliability  were  examined.  The  most  vulnerable 
section  of  the  23  node  network  is  the  long  string  of  nodes 
from  node  6  (BBN)  to  node  15  (CASE)  along  the  bottom  of 
Figure  1.1.  The  first  id^a  was  to  add  a  link  from  node  13 
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(BURROUGHTS)  to  node  14  (LINCOLN) .  The  second  idea  was  to 
install  hardware  at  the  IMPs  so  that  if  an  IMP  failed, 
traffic  could  be  routed  around  it  in  one  direction  connecting 
two  of  the  incident  links.  Any  remaining  links  are  effectively 
blocked.  The  results  of  these  analyses  are  shown  in  Figure  1.3. 
For  p=.02  the  improvement  is  negligible  and  does  not  justify 
the  cost  of  implementation  although  for  higher  values  of  p 
the  improvement  becomes  more  significant.  The  expected  frac¬ 
tion  of  non-communicating  node  pairs  is  a  purely  topological 
reliability  measure  since  it  does  not  completely  reflect  the 
degradation  of  throughput  due  to  element  failures.  The  most 
detailed  level  of  analysis  of  reliability  incorporates  element 
failures,  flow  requirements,  routing,  acceptable  delays  and 
other  pertinent  network  characteristics.  In  order  to  test 
the  adequacy  of  the  ARPANET  under  the  most  stringent  of 
conditions,  a  reliability  analysis  treating  these  factors  was 
performed.  The  effect  on  throughput  at  average  delay  of  0.2 
seconds  was  examined  by  removing  nvjdes  and  links  from  the 
network  and  applying  the  NAC  routing  and  analysis  algorithms 
to  the  remaining  network.  The  nominal  throughput  of  the  23 
node  network  with  all  elements  operable  is  11.5  KBPS/node. 

When  nodes  and  links  are  failing  with  p=.02,  the  expected 


throughput  is  at  least  9.0  KBPS/node.  These  results  again 
show  that  for  small  networks,  reliability  is  not  a  dominant 
factor . 

Figures  1,4  and  1.5  depict  a  33  nude  network.  For  the 
network  shown  in  Figure  1.4  the  difference  between  the  ex¬ 
pected  fraction  of  node  pairs  not  communicating  ~  .058  and 
2p(l-p)  =  .040  is  almost  double  the  difference  for  the  23 
node  network  so  that  improving  the  reliability  by  changing 
the  network  configuration  becomes  marginally  feasible.  An 
extra  link  from  FT. BEL  to  ABER  increased  the  cost  by  a  little 
over  1%  and  increased  the  reliability  by  almost  10%.  The 
resulting  network  is  shown  in  Figure  1.5.  Thus,  even  for  a 
network  with  only  33  nodes,  it  is  becoming  necessary  to  con¬ 
sider  reliability  in  more  detail  than  the  "two  connectivity"  , 

criteria.  For  p>.02  it  is  even  more  important. 

* 

3 .  Reliability  Trends  for  Large  Networks 

While  for  smaller  networks  and  law  element  failure  pro-* 
babilities  (p&*.02),  it  was  found  that  designing  the  network 
with  at  least  two  node  disjoint  paths  between  each  node  pair 
for  throughput  in  the  range  8-15  kilobits/second/node  guaranteed 
sufficient  reliability;  a a  networks  become  larger  this  simple 
approach  fails.  Tho  first  experiments  which  indicated  this 
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FIGURE 


started  with  low  cost  networks  of  20,40,60,80,100  and  200  nodes 
with  throughput  approximately  8  KBPS/node  designed  by  NAC's  net¬ 
work  design  program  with  the  reliability  constraint  of  two  node 
disjoint  paths.  The  results  are  shown  in  Figure  1.6  when  nodes 
are  perfectly  reliable.  hs  measured  by  the  fraction  of  node 
pairs  not  communicating,  the  reliability  actually  increased  with 
the  number  of  nodes  up  to  60  nodes  at  which  point  the  reliability 
began  to  decrease.  As  is  evident,  the  decrease  in  reliability  is 
dramatic  even  though  no^es  have  been  assumed  to  be  perfect. 

Figures  1.7  through  1.10  show  the  results  of  analysis  of  a 
family  of  two  and  three  connected  networks  containing  from  20 
nodes  to  200  nodes.  The  networks  analyzed  contain  20,  40,  60, 

80,  100  and  200  nodes.  However,  a  continuous  line  is  drawr  for 
visual  convenience.  On  the  curve  in  Figure  1.8  for  p=0.2,  the 
simulation/analysis  error  is  indicated  by  vertical  bars  with  length 
equal  to  4  times  the  standard  deviation.  If  the  simulation  results 
v’ere  normally  distributed,  this  would  corresponde  to  a  95%  confi¬ 
dence  interval.  It  can  be  seen  from  Figures  1.7  to  1.10  that  when 
there  are  3  node  disjoint  paths  between  every  pair  of  nodes, 
the  unreliability  is  close  to  the  ideal  minimum  which  results 
from  only  the  node  failures  within  the  sampling  error  except 
for  p  =  .1  where  the  i  node  disjoint  paths  curve  is  just 
beginning  to  depart  from  the  idea  curve.  From  these. 
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FIGURE  1.10 


we  can  conclude  that  requiring  3  node  disjoint  paths  between 
every  pair  of  nodes  is  sufficient  to  essentially  guarantee 
an  optimal  reliability  with  respect  to  link  allocations  for 
networks  with  less  than  200  nodes  and  for  element  failure 
probabilities  of  less  than  0.1.  Whether  the  use  of  this 
criterion  would  result  in  expensive  over-dr  ign  should  be 
further  investigated.  In  many  cases,  it  is  clear  that  this 
could  occur  so  it  is  worthwhile  to  develop  rapid  reliability 
analysis  methods  which  can  be  carried  out  repeatedly  in  the 
design  process.  Unfortunately,  at  present  as  fast  as 
the  current  reliability  analysis  techniques  have  become,  it 
is  still  infeasible  to  employ  them  in  an  iterative  design 
process . 

4 .  Implications  for  Further  Research 

Fast  effective  methods  have  been  developed  for  analyzing 
the  reliability  of  networks  [ARPA  Semi-Annual  Reports  2,  3 
and  4]  .  Recently,  as  wi  11  be  described  in  the  next  chapter 
even  more  efficient  analysis  techniques  have  been  bevebped. 
While  these  methods  are  effective  for  quite  large  networks, 
they  are  still  too  slow  for  use  in  an  iterative  design 
procedure.  Recursive  methods  suitable  for  networks  composed 


from  loops  and  trees  are  orders  of  magnitade  faster  and  offer 
hope  for  use  in  design.  These  new  methods  are  described  in 
the  next  chapter.  These  recursive  methods  can  be  used  in  a 
hybrid  manner  with  simulation  using  decomposition  techniques. 
Networks  which  can  be  analyzed  by  recursion  can  also  be  used 
as  control,  variates  in  simulation  of  general  networks. 

Research  is  progressing  ir  these  areas. 

The  selective  "haraening"  of  important  nodes  in  a  computer 
network  is  being  studied  quantitatively.  It  is  clear  that  the 
only  way  to  decrease  the  2p(l-p)  lower  bound  on  the  fraction 
of  non-communicating  node  pairs  is  to  increase  the  reliability 
of  the  nodes  themselves.  One  way  of  doing  this  is  to  put  a 
backup  IMP  at  each  node.  Since  this  is  usually  prohibitively 
expensive,  one  can  select  a  subset  of  nodes  where  backup  can 
be  provided  cn  the  basis  of  a  reliability-cost  tradeoff. 

If* for  very  large  networks  the  cost-reliability  tradeoff 
is  the  dominant  factor  in  network  design,  replacing  the  cost- 
throughput  tradeoff,  there  will  obviously  need  to  be  dramatic 
changes  in  network  design  procedures.  The  surface  ha-  been 
barely  broken  in  this  area. 


II.  RECURSIVE  ANALYSIS  OF  NETWORK  RELIABILITY 


Introduction  and  Summary 

The  network  structure  of  many  common  communication  networks 
can  be  represented  as  a  composite  of  simple  loops  and  trees. 
Reliability  analysis  of  such  networks  can  be  carried  out  very 
quickly  and  efficiently  by  a  new  recursion  approach  described 
in  this  chapter.  Moreover,  a  wide  variety  of  reliability 
measures  can  be  obtained  using  the  sane  general  method.  The 
measures  studied  here  are: 

(i)  the  expected  number  of  nodes  communicating  with  a 
central  node  called  a  "root", 

(ii)  the  expected  number  of  node  pairs  communicating, 

(iii)  the  expected  number  of  node  pairs  communicating  by 
a  path  through  the  central  node, 

(iv)  the  probability  that  operating  nodes  can  communicate 
through  the  root, 

(v)  the  probability  that  operating  nodes  are  connected. 
Many  other  measures  are  possible. 

In  Figure  2.1  some  of  the  many  network  structures  that 
can  be  analyzed  using  recursion  are  illustrated.  In  addition. 


even  if  a  network  does  not  have  this  precise  structure,  the 


reliability  of  the  network  can  often  be  approximated  by  the 
reliability  of  such  a  network  or  a  hybrid  computation  using 
recursion  on  the  tree  and  loop  parts  of  the  network  together 
with  simulation  for  the  other  parts  can  be  carried  out. 

(This  generalized  approach  is  now  under  study) .  These  tech¬ 
niques  then  offer  a  very  powerful  tool  in  the  analysis  of 
network  reliability. 

Tc _ sinology 

We  will  develop  a  very  general  class  of  recursive  methods 
for  a  wid^  variety  of  reliability  criteria.  To  do  this  it  is 
very  economical  to  employ  a  recursive  characterization  of 
rooted  trees  [Knuth:1968,  Section  2.3]. 

Definition:  A  rooted  tree  is  a  finite  set  T  of  one  or 

more  nodes  such  that: 

(a)  There  is  one  specially  designated  node  called  the 
root  of  the  tree,  root  (T) ;  and 

(b)  The  remaining  nodes  (excluding  the  root)  are  parti- 
♦*  .oned  into  0  disjoint  sets  Tp  T^,  T3,  ...,  T^,  and  each 
of  these  sets  in  turn  is  a  rooted  tree.  The  trees  Tn ,  ...,  T_ 

JL 

are  called  subtrees  of  the  root. 


1 


The  terminology  of  Knuth  is  somewhat  different  from  ours. 


As  Knuth  points  out  there  are  several  models  other  than 


the  obvious  one,  a  tree  graph  with  a  distinguished  node,  but 
we  will  confine  ourselves  to  tree  graphs.  To  make  this  associ¬ 
ation  more  explicit  we  introduce  some  more  terminology.  The 
root  of  a  tree,  J,  is  said  to  be  the  father  of  the  root  of 
each  of  the  subtree-  of  J.  The  root,  I,  of  a  subtree  of  J 
is  said  to  be  a  son  of  J.  Figure  2.2  depicts  such  a  rooted 
tree  graph  where  links  are  shewn  between  fathers  and  their  sons. 
A  link  is  a  pair  of  nodes  one  of  which  is  the  father  of  the 
other ^  Thus  node  1  is  the  root  of  the  entire  tree.  Node  2 
is  the  root  of  tne  only  subtree  of  1  and  hence  2  is  the  son 
of  1  and  1  is  the  father  of  2.  The  corresponding  subtree  of 
1  is  determined  by  the  nodes  |2, 3 , 4, 5 , 6, 7, 8, 9, 10 j.  Node  2  has 
two  subtrees  on  {3,4,5^  and  ^6, 7, 8, 9, 10 j  with  roots  3  and  6 
respectively.  Node  3  has  two  subtrees  ^4^  and  .  Node  4 

has  no  subtrees. 

Since  we  will  be  dealing  witn  computer  methods  of  solution^ 
it  is  necessary  to  impose  a  linear  ordering  for  storage  purposes 
This  will  be  done  by  a  father  function.  Suppose  we  have  a  net¬ 
work  on  NN  nodes,  5 1, 2 , ... , NN ),  and  for  each  node  I  except  1  we 
have  a  node  F(I),  the  father  of  I,  such  that  F(I)<I  and  (I,F(I)) 
is  a  link  in  the  network.  Then  F  defines  NA-NN-1  links  and  in 
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fact,  the  existence  of  a  father  function  F  is  a  necessary  and 
sufficient  condition  for  the  network  to  be  a  rooted  tree.  The 
special  node  1  (which  has  no  father)  is  of  course  the  root  of 
the  tree  (sometimes  called  the  patriarch) .  Associated  with 
each  node  I  is  a  rooted  subtree  consisting  of  nodes  with 
greater  numbers  which  are  connected  to  I  by  a  path  passing 
through  nodes  with  labels  £  I.  In  Table  2.1  the  father  function 
for  the  tree  in  Figure  2.2  is  given. 

3 c  Recursive  Computations  on  Trees 

We  now  want  to  calculate  the  reliability  of  a  tree  network 

assuming  the  reliability  of  its  elements,  nodes  and  links,  are 

known.  It  is  not  immediately  obvious  what  the  "reliability  of 

a  tree"  should  mean;  we  will  consider  several  meanings.  however, 

the  general  approach  in  each  case  will  be  the  same.  Considering 

the  tree  to  be  a  rooted  tree  in  the  sense  of  Knuth,  we  associate 
•• 

a  state  vector  with  the  root  of  each  of  the  subtrees.  We  then 
defina  a  set  of  recursion  relations  which  yield  the  state  vector 
of  a  rooted  tree  given  the  state  of  its  subtrees.  For  subtrees 
consisting  of  single  nodes  the  state  is  obvious.  We  then  join 
the  rooted  subtrees  into  larger  and  larger  rooted  subtrees 
using  the  recursion  relations  until  the  state  of  the  entire 


network  is  obtained. 
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Deriving  the  recurrence  relations  is  somewhat  mechanical 
aisu.  It  comes  simply  from  considering  the  situation  depicted 
in  Figure  2.3.  We  have  two  subtrees  one  with  root  I  and  the 
other  having  as  its  root  J=F(I).  We  assume  the  state  of  I  and 
J  are  known  and  we  wish  to  compute  the  state  of  J  relative  to 
the  tree  obt ained  by  joining  I  and  J  by  the  link  (I , J) . 

To  illustrate  the  technique  let  us  consider  the  first  and 
easiest  criterion.  Namely,  we  wish  to  know  the  expected  number 
of  nodes  which  can  communicate  with  the  root  node  1.  We  assume 
we  have  associated  with  each  node  I  a  probability  of  r.ode  failure 
PN(I)  and  a  probability  QN (I) =  1-PN (I)  of  the  node  being  present. 
Similarly,  for  the  link  (I,F(I))  we  have  probabilities  PL(I) 
and  QL (I )  of  the  link  failing  and  being  operative  respectively. 
The  state  vector  of  a  subtree  with  root  I  is,  in  this  case,  a 
scalar,  S(I)  which  is  the  3xpected  number  of  nodes  in  the  sub¬ 
tree  which  communicate  with  the  root  I,  including  I.  To  derive 
the  recurrence  relations  we  consider  two  subtrees  with  I  and 
J=F ( I )  as  roots,  respectively.  We  then  want  to  derive  the 
state  of  the  new  subtree  obtained  by  joining  I  and  J  together 
by  (I,J).  Let  S(I)  and  S ( J)  be  the  known  states  for  the  two 
subtrees  and  S(J)'  the  resulting  state.  If  the  link  (I,J) 
and  the  node  J  are  operational  S ( J) ' =S (I ) +S ( J) ;  if  not  then 
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S(J)'-S(J).  Putting  the  two  together  we  have  the  recurrence 
relation:  S  (J)  '  =S ( J)+S (I) QN (J) QL(I)  ./here  QN(J)  is  the  proba¬ 

bility  that  node  J  is  operative  and  QL(I)  is  the  probability 
that  the  link  (I, J)  is  operative.  Now  all  that  remains  is 
to  put  this  in  the  form  of  an  algorithm: 

Step  0:  (initialization)  Set  S(I)=QN(I)  (the  probability 
that  noie  I  is  working),  1=1,... ,  NN.  Set  I=NN.  Go  to  Step  1. 
Step  1:  Let  J=F(I) ,  and  set  S(J)  to  S  ( J)+S  (I)  QN  U )  QL (I); 
go  to  Step  2.  ^ 

Step  2:  Set  X  to  1-1.  If  1=1,  stop;  otherwise,  go  tc  Step  1. 

When  the  algorithm  stops  S(l)  is  the  expected  number  of 
nodes  communicating  with  node  1  (counting  node  1) . 

For  our  next  criterion  we  compute  the  expected  number  of 
node  pairs  communicating.  For  this  criterion  we  utilize  a  two 
dimensional  state  vector.  We  will  use,  as  before,  S(I)  to  be 
the  expected  number  of  nodes  in  the  subtree  which  communicate 
with  I,  and  a  new  state  component  T(I)  which  is  the  expected 
number  of  node  pairs  communicating  in  the  subtree.  The  recur¬ 
sion  relation  for  S (J)  is  as  before  S ( J) ' =S ( J)+S (I) QN (J) QL (I) . 
The  recursion  relation  for  T(J)  is  T ( J) ' =T (I ) +T ( J) +S ( I ) S ( J)  QL  :  I ) 
since  we  have  the  same  pairs  communicating  as  before  and  if  the 
link  (I,J)  is  operating  S(I)  nodes  in  one  tree  can  ommunicate 
with  S ( J)  nodes  of  the  other  for  S(I)S(J)  additional  node  pairs. 
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The  resulting  algorithm  is; 

Step  0;  (Initialization )  Set  S (I ) =QN ( I ) ,  T(I)=0,  1=1,...,  NN. 

Set  I=NN.  Go  to  Step  1. 

Step  1:  Let  ^  F(I};  set  T(J)  to  T (I)+T (J)+S (I) S (J) GL(I) ,  and 

% 

then  set  S (J)  to  S (J)+S (I)QN(J) QL(I) .  Go  to  Step  2. 

Step  2:  Set  I  to  1-1.  If  1=1,  stop;  otherwise,  go  to  Stc^  1. 

T  (I)  ends  up  with  the  desired  result.  Note  in  St  p  1, 

T ( J)  must  be  updated  before  S  (J) . 

In  many  real  systems  node  pairs  can  communicate  only  through 
the  root.  So  for  our  next  criterion,  we  consider  the  expected 
number  of  node  pairs  which  are  connected  by  a  path  through  the 
root.  To  analyze  this  case  we  consider  a  state  component  R(I) 
in  place  of  T(I),  where  R(I)  is  the  expected  number  of  node 
pairs  (pairs  including  I  are  allowed)  boui  of  which  are  con¬ 
nected  to  the  root  node  I.  S(I)  has  the  same  meaning  as  before. 

The  recurrence  relation  for  S(I)  also  remains  unchanged.  The 
recurrence  relation  for  R(I)  is  R ( J) 1 =R ( J) + (S (I ) S ( J) +R (I ) OX (J)) QL (I) . 
The  algorithm  needs  only  to  be  modified  by  changing  the  recurrence 
relation  for  T(J)  in  Step  1  to  the  one  for  R(J).  The  state  com¬ 
ponents  for  this  last  criterion  are  illuminating.  For  if  one 
knows  the  number  of  nodes  connected  to  the  root,  say  n,  then 
the  number  of  node  pairs  communicating  through  the  root  is 
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n(n-l)/2.  This  would  seen  to  imply  that  either  S(I)  or  R(I) 
could  be  eliminated  and*a  state  vector  with  one  component  would 
be  possible.  This  is  not  the  case  because  the  expectation 
operation  does  not  commute  with  squaring,-  that  is,  Exp[n  (n-l)/2] 
/  (Exp  n)  (Exp  n  -l)/2,  in  general,  for  n  random. 

We  now  turn  to  a  class  of  reliability  criteria  relaL-'d  to 

whether  the  network  is  connected  or  not.  The  first  result  is 

immediate:  the  probability  QC  of  the  tree  being  connected  is 

NN  NN 

(1)  QC  =  TT  QN(I)  IT  QL  (I )  . 

1  2 

If  we  don't  insist  that  the  entire  network  be  connected  but  only 
the  subnetwork  involving  operative  nodes  be  connected  we  get  a 
new  probability  QC.  The  calculation  is  more  interesting  in  this 
case.  Here  we  need  a  state  vector  for  each  subtree  with  3 
components.  They  are: 

N ( I )  -  The  probability  that  all  nodes  in  the  subtree  are 
failed. 

C ( I )  -  The  probability  that  the  (non-null)  set  of  operative 
nodes,  including  the  root  of  the  subtree,  are  connected. 

B (I)  -  Tne  probability  that  the  root  of  the  subtree  is 
failed  and  the  set  (non-null)  of  operative  nodes  in  the  subtree 
is  connected. 

In  (I),  C  (I )  ,  and  B  (I )  account  for  all  tree  networks  whose 
operative  nodes  c ommunicate . 


The  recurrence  relations  in  this  case  are: 


(2a)  C(J)  '=C(I)C(J)QL(I)+C!J)N(I) 

(2b;  N ( J) ‘ =N (I ) N ( J) 

(2c)  3(J)  'sBUlNdl+BdlNlJl+ClTlNlJ) 

As  we  mentioned  before,  often  in  practical  situations  all 
communication  has  to  t d;e  place  through  the  root  node.  So 
another  interesting  reliability  condition  is  the  probability, 
QR,  that  all  operating  nodes  can  communicate  with  the  root. 

As  can  be  seen  f~om  the  definition  of  C,  QR=C (l)-rN (1) . 

An  algorithm  for  obtaining  both  criteria  is: 

Step  0:  (Initialization)  Set  N (I) =  PN (I) ,  C(I)=QN(I),  5(I)=0, 
1=1,  ...,  NN.  Set  I=NN.  Go  to  Step  1. 

Step  1:  Let  J=F(I).  Using  equations  (2),  recalculate  B(J), 

C  (J; ,  and  N ( J) ,  in  that  order.  (Note  that  the  order  of  calcu¬ 
lations  is  important  as  calculations  should  be  done  with  the 
old  values  of  3(J),  C ( J) ,  and  N(J).)  Go  to  Step  2. 

Step  2:  Set  1=1-1.  If  1=1,  step;  otherwise,  go  to  Step 

After  the  algorithm  terminates,  we  obtain  the  probability 
of  all  operating  nodes  communicating  by  QC=C  (I)  +  B  (l)-rN  1)  and 
the  probability  of  all  operating  nodes  c  armrnicating  with  the 
root  by  QR=C (1)+N(1) . 
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Wo  summarize  the  various  algorithms  in  Table  2.2.  The 
algorithms  for  finding  the  reliability  measures  discussed  in 
this  section  were  coded  in  FORTRAN  IV  and  executed  on  a 
CEC-6S00.  The  average  running  time  for  a  500  node  tree  was 
1.5  seconds. 

4 .  Trees  with  Wt iqhted  Nodes 

In  the  previous  section  it  was  assumed  that  the  nodes  in 
the  tree  were  all  equal.  In  many  cases  it  is  desirable  to 
assign  a  weight,  W(Ij,  to  each  node,  I.  As  an  example  instead 
of  wishing  to  calculate  the  expected  number  of  nodes  communi¬ 
cating  with  the  root  suppose  we  desired  the  expected  amount  of 
traffic  which  could  reach  the  root  where  each  node,  I,  generates 
W ( I )  units  of  traffic.  To  calculate  this  the  state  variable 
is  5,  just  as  before,  the  only  difference  being  that  the  initial 

conditions  S(I)-  /(I)QN(I)  replaces  the  old  initial  conditions 

% 

S(I)=QN(-I)*  {W(I)  could  also  represent  the  number  of  terminals 

at  node  I.) 

It  is  possible,  by  the  use  of  a  weighting  function,  t 
extend  the  algorithms  of  the  previous  section  to  include  the 
case  where  the  "nodes"  of  the  tree  themselves  represent  trees, 
or  indeed,  more  highly  connected  graphs.  In  this  case,  m 
Step  0,  we  initialize  the  state  vector  of ’each  "node"  of  the 
network  to  the  value  of  the  state  vector  of  the  subnetwork  wo 
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arc  treating  as  a"node."  Thus,  in  the  previous  example,  we 
initialize  the  value  or  S(I)  to  the  expected  number  of  nodes 
communicating  with  node  I  in  the  subnetwork  we  are  treating 
as  a  "node"  .  In  .general  it  may  be  possible  to  obtain  these 
values  analytically  if  the  graphs  are  small,  or  it  may  be 
necessary  to  obtain  them  by  simulation  or  some  other  means. 

5 .  Extension  iO  General  Networks 

In  network  design  it  is  common  practice  to  reinforce  the 
connections  among  a  key  set  of  central  nodes,  especially  in  the 
case  where  ail  communication  must  take  place  through  these  nodes 
An  example  of  the  simplest  such  configuration  of  this  type 
where  the  central  nodes  are  connected  in  a  cycle  is  shown 
in  Figure  2.4. 

The  algorithms  we  have  considered  can  be  easily  extended 
to  handle  such  networks.  Note  first  that  without  any  modifica¬ 
tion  to  the  algorithms,  the  network  shown  in  Figure  2.4  can  be 
reduced  to  a  comparatively  simple  network  consisting  of  the 
central  nodes  only.  We  would  consider  each  central  node  as 
the  root  of  a  separate  tree  and  analyze  the  tree  using  the 
algorithms  of  Section  3.  When  the  algorithm  terminates,  rhe 
state  vector  at  that  node  would  reflect  the  structure  of  the 


entire  tree  rooted  at  the  node.  Analysis  could  then  be  carried 


out 


,  either  analytically,  or  by  simulation  cn  the  simplified 
network  of  central  nodes. 

If  the  simplified  network  is  a  loop,  we  can  use  the 

algorithms  of  Section  3  to  analyze  if  by  making  the  following 

observation:  If  any  component  in  a  loop  fails  the  resulting 

network  is  a  chain.  A  chain  is  a  special  kind  of  tree  and 

can  be  analyzed  using  the  recursive  method. 

Suppose  we  arc  given  a  cycle  C„,  containing  N  elements 

(nodes  and  links)  with  ordering  on  the  elements  so  that  they 

arc  numbered  e^,  e0,  ...,  e^  in  a  clockwise  direction  starting 

from  some  element,  and  that  we  desire  to  evaluate  a  reliability 

1  2 

criterion,  RL(CN)=RIV\  on  C^.  Consider  RL^f  and  RL^  where: 


RL~=RL^  given  is  operative  ana 

2  „  .  , 

RL^r~RL^  given  e,^-  is  r  a  ilea. 


Therefore  RL:~RjJ:Cb  (X)  -t- Rl^Pd  (X)  where  QE(X)  is  the  probability 
that  the  element,  e^,  works  ana  Pz^X)  is  t no  probability  it  rail 


RI,f.  is  easily  evaluated  by  previous  methods  as  the  resulting 
X 

]  1  ^ 

network  is  a  chain.  To  evaluate  UL.;  •  ^s:  cler  RLV  and  RLC-  i 

A  A  -i-  * '  -k 


where: 


Rr.jj.-i  -rln 

RL"  ,  RL v 
X-I  a 


given  is  operative  and 

given  is  failed. 
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Therefore  RL^=RL^._1QE  (N-l)  +Rljj„  ^PE  (N-l)  .  This  procedure 
can  be  repeated  to  yield  a  sequence  Rlj  and  RL^  which  are 
defined  on  disjoint  segments  of  the  total  probability  space  and 
can,  therefore  be  summed  to  yield  the  desir  ed  value,  RL^.  A'  1 

j.'. 

of  these  values,  with  the  exception  of  RL^  can  be  evaluated  in 
terms  of  chains  and  can  therefore  be  evaluated  as  before.  RL^ 
is  defined  on  the  cycle  with  all  components  operative,  and 
is  therefore  easily  evaluated.  For  example,  if  the  cycle  is 
composed  of  N  noaes  with  weights,  W(I),  and  if  RL  is  the  ex¬ 
pected  number  of  node  pairs  communicating  then  RL^  is  simply 


W(I)W(J)  . 


Note  also  that  the  calculations  of  the 


1  =  1  J=I-rl 
2 

Rl.j  can  be  simplified  by  the  observation  that  two  adjacent 
operating  elements  e^  and  can  be  replaced  by  an  equivalent 

element  e*  with: 


W(*)=QE(I)W(I)+QE(I+J  )  W  (  It  i )  and 
QE  (*)=QE  (I)  QE  (1+1)  - 


This  prccedure  replaces  the  evaluation  of  RL  on  a  cycle 
with  a  evaluations  of  RL  on  chains.  The  order  of  computation 
is  chus  increased  by  a  factor  of  N.  The  results  can  be  extended 
still  further  to  networks  cor.t  .ir.ing  more  than  one  cycle,  but 
the  order  of  computation  will  be  increased  in  general  by  a  factor 
of  Nc  (the  number  of  elem^-  ts  in  the  cycle)  for  each  cycle  in 

•ir> 
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the  network  and  will  become  excessive  unless  X  or  the  number 
of  cycles  is  small. 

The  same  procedure  is  effective  in  analyzing  networks  of 
the  form  shown  in  Figure  2.5.  It  can  be  used  first  on  each  of 
the  outer  loops  to  obtain  the  expecte  d  number  of  node  pairs 
communicating  within  the  given  loop,  1..,  and  the  expected 
number  of  nodes  in  the  loop  which  can  communicate  with  e.  , . 
These  can  then  be  used  as  initial  conditions  for  S (I)  and  T(I) 
in  the  analysis  of  the  central  loop.  If  there  are  n  noees  in 
the  inner  loop  and  K  nodes  in  each  of  the  outer  loops,  the 
entire  procedure  can  be  carried  out  in  K  n+n^  steps. 

Point  Evaluation  Versus  F  metiers"  Evaluation 

The  calculations  in  the  algorithms  can  be  carried  out  in 
two  ways.  In  the  first  way  link  and  node  probabilities,  PL(I), 
QL ( I ) ,  PX ( I ) ,  QX^T).  can  be  considered  as  numbers  and  the  re¬ 
liability  criterion  can  be  evaluated  as  a  number.  The  evalua¬ 
tion  can  also  bo  functional;  that  is,  the  reliability  of  the 
subtrees  can  be  represented  as  polynomial  functions  of  the  link 
and  node  probabilities.  This  approach  will  of  course  require 
much  more  storage.  The  storage  requirement  are  considerably 
reduced  if  all  the  node  probabilities  have  the  same  value  PX-1- 
and  ail  the  link  probabilities  h..ve 


the  same  value  Pb^l-Qb 


this  case  the  various  state  components  can  be  represented  by 
the  coefficients  of  power  series  in  QN  and  QL. 

As  an  example  we  carry  out  the  calculations  for  the  network 
in  Figure  2.6  using  as  our  criterion  the  expected  number  of 
node  pairs  communi eating.  We  assume  all  links  are  optative 
with  probability  QL=p  and  all  nodes  operative  with  probability 

QX=q. 


Initialization:  S(l)=q,  T(I)=0,  1=1,  2,  3,  4,  5,  6. 

1=6 :  J=F  (  6)  =3 

T  (3  )  :  =T  (3  )  +T  ( 6)  +S  (6)  S  (3  )  p 

-O-rO+qqp 

2 

=q  ? 

S (3) :=S (3)+S (6) qp 
=c+cqp 
=q-rq2p 


J=F (5 ) =3 

T  (3)  :=T(3)+T(5)+S  (5)S(3)p 

2 

=q/p-rO-fq  (q-rq  p)p 
=  2c^p-fq2p2 

S(3):=S(3)+S(5)qp 


=q*-q^p-rcqp 


AH 


1=-; : 


J-F(4)=2 


T  (2 )  :  =T  (2 )  +T  (4)  +S  (4)  p 
=q2p 

S (2) :=S (2)+S (4jqp 

=q+q2P 


1=3:  J=F (3 ) =1 

T(l):=T(l)+T(3)+S(3)S(l)p 

-  (2q2p-rq3p3).j.  (q+7q2p)qp 

=3q3p+3q3p2 
S  { 1 ) : =S ( 1 ) +S ( 3 ) qp 
=q+ (q+2q2p) qp 


JL=2_:  J=F(2)=1 

T  (1)  :  -T  (1)  +T  (2  )  J'S*(2  )  S  ( i )  p 

=  (3q  PT3q3p2)+q2p+ (qrq2p)  (c-.-c2p+2q5p2)  p 
=5q2pi- 5q3p2x3q4oJt2q5p4 

S  U)  :=S  (l)-rS  (3  )  qp 

=  (q=q2p+2 c3p^ )  + (qTq2p j qp 

—c+2c*~  px3q3=2 


Note  that 

the 

highest  orde 

to  tne  longest 

path 

oe tween  two 

that  all  terms 

in  S 

U)  and  T(i) 

Thus,  v’c  could 

nave 

simplified  t 

order  polynomial,  q5p‘\  corres-.< 


:c  a^so 


.)0 


JtO. 


an  equivalent  tree  with  invalnerab]  links  and  nodes  wit 


probability  of  operation  r=pq,  except  for  the  root,  whic 
stili  has  probability  of  operation  q. 


III.  A  MdW  ALGORITHM  FOR  MihIMbtt  S  PAL.;;  I  LG  TREE  CALCULATION 
1.  Introduction  and  Summary 

A  minimum  spanning  tree  (also  known  as  a  shortest 
tree)  is  a  tree  in  a  networks  whose  total  sum  of  link 
lengths  (or  costs)  is  as  small  as  possible.  Finding  a 
minimum  spanning  tree  is  one  of  the  most  common  and  most 
important  calculations  in  network  analysis.  Minimum  span¬ 
ning  trees  have  been  shown  to  be  useful  in  reliability  analysis 
(a  new  application),  least  cost  electrical  wiring,  minimum 
cost  connecting  communication  and  transportation  networks, 
minimum  stress  networks,  clustering  and  numerical  taxonomy, 
travelling  salesman  problems,  mu] titorminal  network  flows, 
and  Tclpak  routing. 

Currently  the  most  favorea  algorithm  for  finding 
minimum  spanning  trees  is  one  due  to  Prim  (1957 j  and 
Dijkstra  (19  >9]  .  This  algorithm  takes  on  the  order  of  r.2 

computations  'where  n  is  the  number  zZ  nodes.  It  is  simple 
to  code,  conservative  of  computer  storage  and  is  the  fastest 
xnov.n  methoc  ior  cu  ue  ncuwoixs.  however,  it  nas  me 
unror  tun  cite  cnaractor  j  sale  tne  a  tne  numer  oi  operu  c.  jLons  s 
not  s uds ta n t la  1  iy  rcuuceu  wnon  tne  netv/orx  is  sparse.,  tnat  is 
wn>...  me  ratio  o*.  links  to  nowas  is  snexi  as  is  mo  >nso  m 
most  practical  networks  ouch  as  ARPAXLT.  Moreover,  it  is 


too  inflexible  for  use  in  network  reliability  applications. 

This  xed  NAC  to  re-examine  an  earlier  solution  approach 
due  +  o  Krushal.  By  judicious  use  of  list  processing  techniques 
and  modern  sorting  techniques,  the  computation  for  this  method 
became  of  the  order  m  log  m  where  m  is  the  number  of  links. 

For  complete  networks  m  =  n(n-l)/2  and  Prim's  algorithm 
is  faster.  However,  in  many  applications  in  particular 
in  the  reliability  analysis  of  the  ARPA  network  m^2n  in 
which  case  NAC's  version  of  Kruskal's  Algorithm  [3rd  and 
4th  Semi — annual  reports]  is  much  more  efficient.  Moreover, 
NAC's  version  of  Kruskal's  algorithm  is  much  more  flexible 
although  at  the  cost  of  increased  complexity  of  the  algorithm. 

Here  we  report  on  a  dramatic  improvement  of  Kruskal's 
algorithm  which  makes  it  comparative  with  Prim's  for  complete 
networks  also.  Thus  NAC's  version  of  Prim’s  algorithm  is 
competitive  in  computation  time  for  nearly  complete  networks, 
is  much  superior  for  sparse  networks  ana  is  much  more  flexible. 
The  only  remaining  advantage  for  Prim's  algorithm  is  that  in 
certain  situations,  the  storage  requirements  for  nearly  com¬ 
plete  networks  is  less  for  Piim's  algorithm,  than  tor  the 
K '-ushul  a  1  g or 1 1 hm . 

To  be  more  specific  wo  consider  a  network  with  nodes 
N  =  n)  and  a  set  of  in  links  A.  Furthermore,  each 


S3 


uA&* 


liCUi 


link,  (i, j)£A  going  from  i  to  j ,  has  a  length  a. .  associated 

3 

wian  it.  We  then  ask  what  is  the  spanning  tree  of  shortest 
total  length  for  N.  A  generalization  we  will  also  consider 
is  to  find  the  shortest  spanning  forest  with  a  fixed  number, 
k,  of  components. 

These  very  simple  problems  in  graph  theory  have  many 

practical  applications.  The  most  obvious  application  of 

minimum  length  spanning  trees  (KSTs)  is  to  minimum  connecting 

networks.  Thu.',  if  one  wanes  to  connect  n  points  using  the 

shortest  network  the  solution  is  a  MST  (assuming  there  is  no 

cycle  or  links  with  negative  length) .  This  fact  has  been 

used  in  transportation  problems,  communication  design  problems, 

and  problems  of  wiring  points  together  using  minimum  wiring 

in  electronic  wiring  problems  [Loberman  and  Weinberger:  1957]. 

Kalaba  [1964]  considered  the  following  type  of  reliability 

probleifi  on  a  network.  Suppose  with  each  link  (i,j)  there 

is  associated  a  stress  s,-^.  The  problem  is  to  find  a  minimum 

j 

stress  path  connecting  the  two  given  nodes;  tnat  is,  a  path 
connecting  the  two  nodes  such  that  the  maximum  stress  for  a 
link  on  the  chain  is  minimized  over  all  chains  connecting 
the  two  given  nodes.  It  turns  out  that  the  path  ^otweon 
two  nodes  determined  by  a  XST  is  a  minimum  stress  path. 


50. 


An  application  in  v/'nich  minimum  spanning  forests  are  of 
interest  is  in  clustering  anax_,~is  under  the  name  of  single 
linkage  ci  analysis  [Gower  and  Ross:  1969]  [Zahn:  1971]. 

Suppose  we  ha  -e  a  set  of  points,  S,  and  a  function  p(i,j) 
which  is  a  measure  of  the  similarity  of  the  points  i  and  j. 

A  family  of  subsets  ^CiTt  of  S  form  a  o  family  of  clusters  if 
for  each  cluster  Cm  and  each  pair  of  nodes  i  and  j  in  C  j 
there  is  a  sequence  i=i^,...,  i^=j  i^+f)  -  A  for 

k=l,...,  K— 1  and  for  every  pair  of  nodes  i  and  j  in  different 
clusters  /^(i,j)>  <$  .  For  a  given  $  the  5  family  is  unique 
and  corresponds  co  the  components  of  a  minimal  spanning 
forest  over  all  spanning  forests  with  he  same  number  of 
components . 

A  final  application  which  motivated  our  interest  is  Monte 
Carlo  simulation  of  network  reliability.  Suppose  we  ha^e 
a  .etwork  in  which  the  links  have  a  probability  p  of  fail- 
^n  or  q^l-p  of  not  failing.  We  wish  to  investigate  the 
probability  of  the  network  "failing."  The  network  "fails" 
if  it  becomes  disconnected.  In  any  but  the  simplest  cases, 
exact  analysis  is  prohibitively  difficult.  Monte  Carlo 
simulation  then  becomes  attractive  [Van  Slvke  and  Frank:  197,cj 
The  straightforward  approach  is  to  generate  a  random  number 
r •  •  for  each  link  (1,3)  ;  if  the  random  number  r .  „  is 

x  ,  j  »  J 

greater  than  q  the  link  is  removed;  otherwise  it  stays  m. 


Tne  resulting  subnetwork  is  then  examined  to  see  if  it  is 

connected.  The  procedure  is  then  repeated  and  an  estimate 
is  generated  in  the  obvious  way. 

However,  in  most  practical  situations  the  probability 
of  the  network  being  disconnected  is  desired  for  a  range  of 
valucs  for  q.  Suppose  we  want  to  find  the  probability  the 
network  is  disconnected.  h(q),  for  all  q  between  0  and  1  by 
Monte  Carlo  simulation,  a  possible  method  is  to  take  the 
link  with  the  smallest  ri#j/  then  the  link  with  the  next 

1l»j>  an<*  so  on'  nntii  a  connected  graph  is  obtained, 

Let  the  last  link  have  r  .  =  a®.  The-i  sn <- 

1,1  q  '  lne‘1  ‘or  the  net 

is  disconnected  for  this  one  sample  and  for  q  i  qG  the 
network  is  connected.  Thus  we  get  one  sample  for  every 
value  of  q.  We  then  generate  a  new  set  of  random  numbers 
for  the  links  and  obtain  a  second  sample  for  each  value  of 
q  and  continue  until  the  variance  of  the  estimate  is  suf¬ 
ficiently  small.  We  then  use  the  fraction  of  the  times  the 
network  was  not  connected  as  an  estimate  for  h(q).  ic  turns 
out  that  q  can  be  efficiently  determined  by  finding  a 
minimum  spanning  tree  us  .ng  rifj  as  the  length  of  link  (i,j). 

MST  have  also  been  applied  to  multiterminal  network 
t low  analysis  [Gomory  and  Hus  1961)  and  to  the  solution 


•s2. 


of  traveling  salesman  problems  [Held  and  Karp:  1970] 
[Held  and  Karp:  1971], 


•  A  History  of  Min imum  Spanning  Tro*  calculation* 

The  first  major  contribution  to  the  theory  of  xst's 
was  by  Kruskal  [Kruskal:  1956]  although  Choquet  in  1953 
ana,  Qccorair.g  to  Kruskal,  3oruvka  in  1926  did  some 
some  earlier  work.  Kruskal ..  major  contribution 
was  to  show  that  a  "greedy"  algorithm  [Edmonds:  1971] 

could  be  used  to  find  minimum  spanning  trees.  Spccifica; 

he  showed  chat  an  X£t  mav  be  v 

iiu/  on  earned  by  repeating  the 

following  step: 

Take  the  shortest  link  which  has  not  been 
chosen  or  discarded,  if  it  does  not  form 
a  cycle  with  some  of  the  previously  chosen 
links,  add  it  to  the  chosen  links;  other- 
wise  discard  it. 

Tms  algorithm  is  deceptively  simple.  The  means  of 
implementation  on  a  computer  is  not  obvious.  The  first 
attempts  [Obruca:  1964]  were  of  the  following  form:  given 
an  nxn  matrix  of  link  lengths,  find  the  smallest 

entry.  Do  a  labeling  procedure  to  fine  if  the  link  forms 


a  cycle  with  the  previously  chosen  link;  if  it  does  not, 
save  the  link^ otherwise  discard  it.  In  either  case,  make 
the  length  of  the  link  plus  infinity  and  repeat  the  process. 
Searching  the  matrix  and  examining  for  loops  involves  on  the 
order  of  n2  comparisons  which  must  be  done  on  the  order 
of  n  times  so  the  running  time  for  the  algorithm  in  this 
form  is  cubic  in  n. 

Shortly  thereafter  Prim  [1957]  and  Dijkstra 
[1959]  proposed  an  algorithm  which  takes  on  the 
order  of  n2  operations.  It  is  based  on  the  following 
theorem:  a  tree  is  an  XST  if  and  only  if  for  every  StN 
there  is  in  the  tree  a  link  of  shortest  length  among  all 
those  connecting  a  node  in  S  to  a  node  in  N-S  [Rosenstiehi : 
1967]  .  At  every  step  of  the  algori  thm  we  ..ive  a  subset  of 
the  nodes  S  and  a  minimum  spanning  tree  on  S.  We  then  find 
a  shortest  link  (i,j)  w-  in  iCS  and  Jo N-S  and  add  j  to  S 
and  repeat.  Slightly  more  formally  the  algorithm  is: 


Prim’s  Algorithm: 


Step  0  (Initialization) :  Set  S^= 

if  (1, 3 )£A,  bj=oo  otherwise.  Set  f 

f_=0  otherwise -Go  to  Steo  1. 
j 

Step  1  (Enlarge  S>  by  one  node): 


-  1  ,  k~,l ,  d1=o 

'  ■*  j. 

i=0,  fj=l  i£ 
Lct  cL *=xin  {cl 


Su  w 

J 

(1 ,  3  )  , 


r  - 
k! 


H 


If  d^*=  -v  go  to  Stop  3;  otherwise  set  S^  r  U  ^  j  *]■  an  I  go 

to  Step  2. 

Step  2  (Update  distances  across  cut) :  For  j  £  N-S^^ 

set  cL  =  Min  [  cL,  JL  *  .1  and  set  f.  =  j*  if  d  •=  £  •*  .. 

3  l  3  3  *,3>  3  3  3*.  3 

Set  k=k+l.  If  k=n  stop.  Otherwise  go  to  Step  1. 

Step  3  (Network  not  connected,  start  new  component) ;  Let  j 
be  any  node  in  N-S^.  Set  fj=0  and  go  to  Step  2. 

The  total  number  of  operations  in  both  Step  1  and  Step  2  are 
quadratic  in  n.  Moreover,  since  the  number  of  links  in  a  complete 
graph,  n(n-l)/2,  is  also  quadratic  in  n  and  since  in  general  all 
links  must  be  examined,  the  order  of  computation  cannot  be  re¬ 
duced  to  a  lower  order  than  quadratic  for  complete  graphs. 
Unfortunately,  even  if  the  graph  is  not  complete,  the  order  of 
calculation  is  still  quadratic  since  the  minimization  in  Step  1 
cannot  be  simplified  in  an  easy  way  to  take  advantage  of  a  net¬ 
work  which  is  spa.rse,  i.e.,  with  m/n  small. 

The  next  stage  of  development  was  to  realize  that  if  the 
link  lengths  in  Kruskal's  Algorithm  were  presorted,  efficient 
sort  algorithms  could  be  utilized.  Conceptually  then  Kruskal's 
Algorithm  would  take  place  in  two  passes.  First,  the  links  arc 


sorted  with  respect  to  length.  This  takes  on  the  order  of  mloc^ro 
operations.  Then  the  links  are  introduced  in  order  of  length 
until  a  spanning  tree  is  obtainedr  It  turns  out  that  the  order 
of  computation  of  the  second  pass  is  dominated  by  the  computations 
involved  in  the  first  pass.  Thus,  if  the  networks  are  sparse 
and,  say,  the  number  of  links,  m,  grows  linearly  with  n  rather 
than  quadraticaily,  Kruskal's  Algorithm  becomes  faster  than  Prim's 
on  the  other  hand,  for  complete  graphs  mlog2m  looks  like  — 

log2  [Ufor.l?  ]  which  grows  faster  than  the  order  of  computation  n^ 
required  in  Prim's  Algorithm. 

Treesort  [Floyd: 1962] [Flovd: 1964] [Williams ; 1964]  is  parti¬ 
cularly  useful  for  use  with  Xrusxal's  Algorithm.  An  informal 
description  of  Treesort  along  with  some  of  its  properties  are 
given  in  Appendix  A.  In  the  next  section  we  turn  to  a  problem 
even  simpler  than  the  MST  problem:  namely,  that  of  finding  out 
whether  a  network  is  connected  or  not.  The  solution  to  this 
problem  furnishes  an  efficient  procedure  for  the  second  pass 
in  the  improved  Kruskal  Algorithm. 

L ir.ding  Components  of  a  Graph  and  Spanning  Forests 

Finding  out  whether  an  undirected  graph  is  connected  or 
not  in  an  efficient  manner  is  not  without  interest  in  itself. 

In  the ‘Monte  Carlo  simulation  of  network  reliability  for  a  single 


value  of  q,  for  example,  determining  the  connectivity  of  a  graph 
must  be  carried  out  thousands  of  times  so  it  is  worthwhile  to 
find  fast  algorithms.  Given  the  graph  in  node’  adjacency  form, 

a  very  efficient  method  of  determining  the  components  is  the 

% 

following : 

Algorithm  A: 

Sten  0  (Initialization):  Set  i=l,  j=l,  S==0.  Label  node  i=l  with 
component  label  j=l. 

Step  1  (Lock  at  new  lir.k)  ;  Find  the  next  node  i‘  adjacent  to  i; 
if  there  are  none,  go  to  Step  3.  If  node  i'  is  not  alrea  in 
a  component,  go  to  Step  2.  If  node  i'  is  already  labeled  with 
a  component  number,  repeat  Step  1. 

Step  2  (Add  a  node  to  current  component):  Label  the  node  i‘  with 
the  current  component  label  j  and  add  the  index  of  the  labeled 
node  to  the  stack  S.  Return  to  Step  1. 

Step  3  (Scan  a  new  node) :  Remove  a  node  index  in  from  S  and  set 
i  equal  to  i" .  Go  to  Step  1.  If  S  is  empty  go  to  Step  4. 

Step  4  (Current  component:  complete — start  a  new  one)  ;  Set  k 
to  k+i .  If  k>l,  we're  done;  otherwise,  if  node  k  is  un labeled, 
set  i  to  equal  k  and  set  j  to  j+1.  Go  to  Step  1.  If  node  k  is 
labeled,  repeat  Step  4. 
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This  algorithm  terminates  with  each  component  having  a 
different  label.  If  the  links  {i,i')  occurring  in  Step  2  arc 
saved,  one  also  obtains  a  spanning  forest.  The  order  of  compu¬ 
tation  is  linear  in  n  and  m  although  if  the  graph  is  nearly 
complete,  the  number  of  links  m  is  quadratic  in  n.  If  one  is 
only  interested  in  determining  if  the  graph  is  connecced  or  not, 
the  algorithm  can  be  terminated  the  first  time  Step  4  is  encountered. 
This  algorithm  is  probably  close  to  being  optimally  efficient  if 
the  links  are  given  in  node  adjacency  form.  Ushakov [ 1967]  has 
proposed  a  similar  algor ithm  which  makes  extensive  use  of  logical 
operators  on  vectors  which  for  many  computers  would  allow  savings 
in  storage  and  computation  time,  however,  he  uses  a  node  ad¬ 
jacency  representation  in  matrix  form  which  requires  n^  storage 
locations  which  may  largely  be  wasted  if  the  graph  is  sparse. 
Moreover,  for  each  node  he  has  tc  search  an  n-vector  to  find 


J.  J.  x.  O  ' 


nor.-zero  element.  This  could  lead  to  a  number  of 


operations  cn  the  order  of  n 


z  „•  -  _ 


here  is  no  special  machine 


instruction  for  rapidly  carrying  out  this  operation,  logically. 


tnc  two  axgontnms  arc  equivalent. 


Algorithm  A  also  has  the  disadvantage  that  the  links  inci¬ 
dent  to  a  node  must  all  bo  scanned  before  links  incident  to 


other  nodes  can  be  worked  on.  This  is  necessary  i..  order  to 


avoid  relabeling  nodes.  For  example,  this  restriction  prevents 
one  from  adding  in  a  simple  way  links  to  a  graph  already  analyzed. 
A  slightly  slower  but  much  more  flexible  algorithm  [Van  Siyke, 
and  H. Frank: 1972]  is: 

Algorithm  E: 

Step  0  (Initialization):  3 tart  with  Aq=0  and  assign  each  node 
a  separate  component  label.  Set  k=C  and  go  to  Step  1. 

Stop  i  (new  link)  :  Add  a  link  a^d^j^)  to  A^  to  form 
(if  there  are  no  remaining  links;  i.e.,  A^=A  stop).  Examine  the 
component  labels  of  i^  and  jj.;  if  they  are  the  same,  repeat 
Step  1  with  k  set  to  k*ri.  If  not,  go  to  Step  2. 

Step  2  (Join  components):  Change  all  the  node  labels  which 

are  the  same  as  the  label  of  i^  (including  i*K's  label)  to  the 

label  of  j,  .  Set  k  to  k+1  and  go  to  Step  1. 
x 

The  order  of  computation  is  dominated  by  the  relabeling 

in  Step  2  which  occurs  n-c  times  where  c  is  the  number  of 

components.  Using  a  straightforward  implementation  [Berge: 

193  2]  [Berge,  Ghouila  -  %:otri  :  1935  j  [Scppanen  :  1970] 

each  time  through  Step  2  the  labels  on  all  n  nodes  have  to  be 

2 

checked  in  order  to  relabel.  Thus,  on  the  order  of  n  operations 
are  involved  with  relabeling. 

In  the  version  of  the  algorithm  used  by  Van  Slykc  and  Frank 
[1972]  a  list  structure  was  maintained  so  that  only  nodes  for 


which  the  labels  arc  changed  arc  considered.  Further,  the 
number  of  nodes  in  each  component  was  maintained  so  that  it  was 
possible  to  change  the  labels  or.  the  smaller  of  the  two  components 
joined  in  Stop  2.  This  reduces  the  maximum  order  of  computation 
to  niog2n  plus  a  term  linear  in  m.  This  increase  in  speed  by 
using  list  struct  docs  incur  an  expense  in  storage  requirements. 
Knuth[1968]  and  Read [1969]  have  proposed  maintaining  component 
membership  using  .tree  data  structures  rather  than  the  explicit 
relabeling  used  in  Step  2  of  Algorithm  B.  However,  in  this  ap- 
prrach  determining  whether  a  candidate  link  connects  two  nodes 
in  the  same  or  in  different  components  takes  several  steps  com¬ 
pared  to  the  one  comparison  required  by  Algorithm  B  and  is  there¬ 
fore  less  efficient. 


4. 


Xew  v<  aments  in  AST  Calculation 

Until  recently,  the  most  efficient  methods  for  calculating 
minimum  spanning  trees  or  forests  was  to  use  Prim's  algorithm 
for  nearly  complete  graphs  which  involves  on  the  order  of  n" 
calculation?  or  using  the  Xruskal  Algorithm  with  Trees or s  and 
Algorithm  B.  The  sorting  p^os  takes  on  the  order  of  mice. m  cal¬ 
culations  while  the  second  pass  involves  nlo^"  dependence  on 
the  number  of  nodes  r.  and  depends  linearly  on  the  number  cf  links. 
Thus,  for  sparse  graphs  where  ml  eg  2*“  is  small  compared  ha 


til 


L 


the  modified  Kruskal  algorithm  will  be  faster.  Important  con¬ 
siderations  other  than  speed  of  computation  will  be  discussed 
in  Section  6;  here  we  report  on  efforts  to  develop  MS?  algorithms 
which  are  uniformly  fast  over  the  full  range  of  sparseness. 

The  first  approach  is  to  notice  that  the  main  expense  in 
the  modified  Kruskal  Algorithm  is  in  the  sorting  which  takes  in 
general  miog^m  operations  and  to  notice  that  most  of  the  links 
are  not  considered  because  they  make  cycles  with  shorter  links. 

In  Treosort  (Appendix  A)  applied  to  the  list  of  link  lengths, 
the  list  is  first  arranged  into  a  binary  tree  which  is  a  "heap“; 
that  is,  each  link  length  is  no  longer  than  its  descendants  in 
the  tree  .  This  takes  about  m  interchanges  and  2m  comparisons 
at  the  worst.  Then  the  top  link  corresponding  to  the  top  of  the 
heap  is  considered  via  Step  1  of  Algorithm  B  for  the  MS T .  Then 
the  link  length  is  deleted  from  the  heap  and  a  new  link  length 
corresponding  to  link  (i,j),  say,  is  taken  from  the  bottom  tc 
the  cop  and  the  heap  restored  by  a  sift-up.  The  sift-up  takes 
at  most  Zlog^m-l  interchanges  and  at  most  2  iog2m-2  comparisons 
to  restore  the  heap.  Often  the  sift-up  can  be  saved  by  comparing 
the  component  labels  of  i  and  j  in  Algorithm  B.  If  they  are  the 
same,  the  link  forms  a  cycle  with  shorter  link  and  can  be  dis¬ 
carded  immediately.  Using  this  approach,  the  sorting  cost  is 

as 


o 


n  the  order  of  2m+k  log.,m  where  k  is  the  number  cf  links  ex¬ 
amined  in  Algorithm  B  before  a  spanning  tree  is  obtained  since 
only  the  1  r.ks  actually  considered  for  the  AST  are  sorted.  In 
general,  one  may  have  to  examine  ali  m  branches  but  for  nearly 
complete  graphs  this  is  unlikely.  Experimental  verification  ok 
this  is  found  in  Section  5.  There  it  is  shown  that  using  this 
further  modification  tc  Xruskal's  Algorithm,  it  becomes  nearly 


as  efficient  as  Prim's  Algorithm  for  complete  graphs 


is  sc 


much  better  than  Prim's  Algorithm  for  sparse  graphs. 

However,  Prim's  method  can  be  improved  also.  Here  we  fol 
the  approach  due  to  2„  Johnson  [1972]  who  applied  the  idea*  tc 
Dijkstra's  shortest  oath  algorithm.  We  use  Treesort  to  determ 
dj=Min  d^  in  Step  1  of  Prim's  Algorithm.  We  assume  the  d^  for. 


"heap"  (see  Appendix  A)  .  The  top  of  tr.he  heap  i 


s  a4.  wmc 
j* 


is  then  removed  from  the  heap.  Then,  in  Step  2  of  Prim's  Algor 


;cme  of  the  d^  become  smaller  and  uro  modified.  Next,  a  dj  rk 


she  bottom  of  the  heap  is  moved  to  the  top. 


Li-.y,  tr.c  neap 


is  restored.  At  the  worst,  each  restoration  of  the  heap  takes 


a  number  of  operations  linear  in  n  and  usually  consi 


is  needed  especially  if  the  network  is  sparse.  Even  if  it  is 
not  sparse,  many  of  the  d_  do  not  change;  furthermore,  all  bui 


one  of  the  d4  which  change  decrease  in  value  so  that  the 


icularly  simple 


'sift-up  procedure”  of  Tree  sort  takes  or*  a  part 
form  (See  Appendix  A) .  In  the  next  section  we  present  the  re¬ 
sults  of  numerical  .xperimonts  on  these  clgorithms. 

5  .  Numo  r leal  Ex pe r imen 1 3 

Numerical  experiments  were  carried  out  on  randomly  generated 
networks.  For  given  n  and  m  random  graphs  of  m  links  and  n  nodes 
were  generated.  Then  a  random  length  between  0  and  l  were  generated 
for  each  link.  The  distribution  of  lengths  is  of  no  importance 

since  as  Rosenstiehl  [1976]  pointed  out,  any  equivalent  pre-orderinc 

•» 

would  give  the  same  results;  thus,  any  method  which  generates 
random  permutations  of  the  jinks  will  suffice.  Three  series, 
four  networks  each,  were  used.  These  are  given  in  'able  3.1. 


Series 


i?.=n-l 


it ;=or. 


rr.-n  (n-l)/2 


[10,9)  (50, 49j(100,  90)  (500,^99) 

( *.0 ,  i 0 ;  (3 0,  xsO /  (,.00,  j00)  (^^0,000; 

(5,10)  (10,45)  (20,  190)  (40,750) 


t ,  j  .  1  »  .  ..  V  OJC  1  .  iT  : ,  L  £  v 

One  hundred  samples  of  Oc.cn  of  the  12  network  sites  were 
analyze*.  *^y  eacn  or  tour  a*  got  i  turns :  mm1  Moaimca  Krusxar, 
?r  s  w* un  sorting,  -'*001  it  — od  .vrusxaj.  s  v?i w  *  parti. ...  —  sorting. 
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Each  algorithm  was  presented  with  exactly  the  same  networks. 

For  each  of  the  algorithms  and  each  of  the  network  sizes,  the 
analysis  tine  for  each  of  a  hundred  trials  was  obtained.  The 
maximum  over  100  trials,  the  average  over  the  1C',  trials  and 
the  standard  deviation  over  the  hur.di  .1  trials  was  recorded. 

The  computer  clock  gives  results  in  milliseconds  and 
routine  itself  takes  less  than  one  half  millisecond, 
are  presented  in  tabular  form  in  Table  3.2  and  in  graphical 
form  in  Figure  3.1.  As  is  suggested  by  theory.  Prim's  works  be 
for  the  complete  graphs  and  Kruskal's  works  better  for  sparse 
graphs.  The  two  algorithms  using  sophisticated  versions  of 
Trees  at  yield  good  results  ever  a  wide;  range  of  sparsity.  Th 
mcamca  Krus.-ca^.  7\igorrtnn  witr.  pa  r  n  a  _  sorting  is  apparently 
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a n e  results 
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peeu  over  a  waac  rnng'e  or  sparseness  is  a r  a  cnacnor 
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Summary  and  Conclusion 

Speed  is  not  the  only  measure  in  choosing  an  algorithm  for 
MS?  calculations.  Other  important  considerations  are:  storage 
requirements,  form  of  input,  availability  of  algorithm  and  diffi¬ 
culty  of  implementation,  the  particular  application,  and  the 
number  of  problem^  Lu  be  solved. 

Prim's  algorithm  is  superior  in  many  of  these  respects.  It 
is  very  fast  for  large  nearly  complete  networks;-  the  algorithm  is 
easy  to  implement;  the  storage  requirements  are  quite  small  es¬ 
pecially  if  the  network  is  complete  and  the  link  lengths  are  a 
simple  function  of  the  end  nodes,  say  Euclidean  distance.  •  Then 
the  link  lengths  need  not  be  stored  at  all  but  are  generated  once 
as  needed  in  Step  2.  With  the  addition  of  Trcesort  in  Step  1 
of  the  algorithm^,  °rim'  s  Algorithm  becomes  much  more  useful  for 
sparse  graphs;  however,  the  algorithm  becomes  considerably  more 
complex  and  some  speed  is  lose  in  analyzing  complete  graphs. 


Prim's  Algorithm  also  has  other  disadvantages, 


It  requires 


tno  nn.<  mrormation 


presented  m  node  incidence  rormat 


and  it  cannot  be  used  for  determining  minimum  spanning  forests 
with  various  numbers  of  components .  This  latter  problem  makes 
Prim's  Algorithm  somewhat  unsuitable  for  single  linkage  cluster 


analysis,  reliability  analysis  c;  networks, 


network  flow  analysis 


The  modified  version  of  Xruskai's  Algorithm  is  very  good 
for  determining  XST  or  minimum  spanning  forests  on  sparse  net¬ 
works  and  it  accepts  the  links  in  any  form.  When  it  is  used 
with  partial  sorting,  it  gives  the  best  results  over  the  com¬ 
plete  range  of  sparsity.  Moreover,  changing  to  partial  sorting 
can  be  done  very  easily.  Both  versions  of  Xruskal's  Algorithm 
require  a  relatively  large  amount  of  storage  because  of  the  list 
structures  required  by  Algorithm  B  and  in  all  cases  require  the 
storage  of  the  complete  list  of  link  lengths. 


We  close  by  analyzing  two  specific  applications.  These 
are  Monte  Carlo  network  reliability  analysis  and  single  linkage 
cluster  analysis.  Most  practical  pipelin.,  transportation,  or 
communication  networks  are  sparse  and  of  reasonable  size.  More¬ 
over,  for  simluation  of  reliability  many  hundreds  or  thousands 
of  trials  must  be  computed,  finally,  if  expected  fraction  of 
node  pairs  communicating  is  u^ed  as  a  criterion  of  reliability 
[Van  Slvke  and  Frank:  1972]  rather  than  probability  or  being 
connected,  it  turns  cut  that  minimum  spanning  forests  are  require 
Thus,  the  algorithm  must  be  fast  for  sparse  graphs,  and  it  mast 
be  capable  of  determining  minimum  spanning  formats .  Moreover, 
storage  is  usually  not  a  problem.  Thus  modified  Xruskal  io 


tor  w.us  ano-i-icaticn, 
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Single  linkage  cluster  analysis  presents  a  slightly  tore 
difficult  case.  Here  in  general  the  network  is  complete  since 
every  parr  of  points  is  related,  which  would  seem  to  indicate 
Frit'. 's  Algorithm;  however,  the  main  results  required  arc  minimum 
spanning  forests  with  various  numbers  of  components  (in  order 
to  get  the  £  clusters)  which  is  net  available  from  Prim's  Algorit 
The  bast  compromise  until  now  was  suggested  by  Gower  and  Ross  [IS 
which  was  to  do  Prim's  Algorithm  and  toss  out  all  links  not  in 
the  MS? .  This  leaves  only  n~l  links.  Then  a  form  of  Kruskal's 
Algorithm  is  applied  to  find  the  minimum  spanning  forests  with 
various  numbers  of  components.  This  approach  is  still  desirable 
when  storage  is  a  problem  and  link  length  is  a  simple  function 
of  the  end  nodes  so  that  the  link  list  need  not  be  stored.  If 
mere  is  suizicient  storage,  tne  no n lie  d  Kruskal  Algorithm  with 
partial  sorting  should  be  much  faster. 


*  *.  1/  ^  z 


A  list  of  m=2K-I  numbers  ro#  . r^k-I  can  be  identified 
with  a  rooted  binary  tree  'with  k  levels  where  n-  is  at  the  root, 
to  and  n~  ^ro  at  the  next  level  anu,  in  general  r^i  and  at 

love-  A*j"i  connected  to  n ^  at  love-  .  m  Figure  3 .  t  tn—s 

mapping  n  i*iuGwrs>.uu  —or  x  ^  • 
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The  list.  L=(ru,  . ..,  n0x-l) '  cr  equivalently  the  binary  tree 
associated  with  it  is  called  a  heap  if  n-~  r\2^  and 
The  first  observation  is  that  if  L  is  a  heap  then  n^=  n^  for 
i=l,  . ..,  2K-1.  If  for  a  given  list  L=(n-^,  . ..,  n^ )  of  length  X » 
)  -f  2^-1  for  some  k,  we  choose  the  smallest  k  such  that  2K~i 2  £ 

v 

and  fill  the  unused  slots  with  +  The  number  n-  is  called  the 
father  of  n2i  and  n2i+1  ana  n2i  and  r.2i+1  arc  called  the  sons  of 
.  An  element  n^  determines  a  unique  subtree  consisting  of  its 
sons,  i.ts  son's  sons,  and  so  on.  In  Figure  2,  n^  determines  the 
trees  ma ».  up  o<.  n  <^ ,  n  '■ ,  n*^,  n  *  ^ ,  n  ^  ,  n  < ,  a  n  d  n  ~  an  a  a — 

mental  operation  in  Trecscrt  is  taking  an  element  n^  for  which 
the  subtrees  determined  by  n,  s  two  sons  r\2^  and  n2-:~i  are  r*eaps 
and  by  permuting  elements  forming  a  heap  which  is  a  subtree 
determined  by  ru  .  This  is  Cu*ll^d  si  ft: -up  of  n^,  although 
the  way  we  dr cw  Figure  3.2  it  should  more  properly  be  called  a 
s if e-down.  An  even  more  mas : c  operation  is  an  exchange-test 


?h is  is  carnca  out  m  twe 


ips.  First,  n0-  and  n 0 ^  - 


i  '  *2i 

are  compared;  then  the  smaller  of  the  two  is  compared  with  n  . 


Ir  n.-  is  larger  n^  is  mtercaangea  wit n 


tne  ..mazier  ana  n^,  end 


“2 -r 1  *  involves  two  comparisons  and  (possibly)  one 

c  nange • 

K)/* 

/b 
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A  sift-up  of  n.  at  Iovl.1  X  accomplished  by  performing 
an  exchange-test  at  n- .  If  there  is  an  interchange,  an  exchange- 
test  is  performed  at  the  n ev;  position  of  which  is  necessarily 
at  level  The  procedure  is  continued  until  there  is  no  inter¬ 

change  or  until  level  k  is  reached  [London:  1970]  .  If  the  subtree 
determined  by  n^  has  k  levels  then  sift-up  takes  at  most  k-1 
exchange-tests  for  a  total  of  2k- 2  comparisons  and  k-1  interchanges 
The  first  part  of  Treesort  consists  of  establishing  a  heap. 
This  is  done  recursively  using  sift-up.  The  subtrees  determined 
by  the  elements  at  level  k-1  can  be  made  into  heaps  in  one  ex- 
change-test  each;  Then  the  elements  at  level  k-2  are  made  into 
heaps  using  sift-ups.  One  works  up  the  tree  until  finally  the 
subtree  determined  by  n^,  which  is  the  entire  tree,  is  made  into 
a  heap.  An  element  a  level  X  determines  a  subtree  with  (k- X  +D 
levels  hence  a  sift-up  could  take  k-jt interchanges  and  2k-2 a 
comparisons.  Since  there  arc  2~  elements  at  level  in  the 
worst  case 


;>  (k-&  2 


A=1 


interchanges  and  twice  that  many  comparisons  would  be  needed. 


since  tr.e  ienctn  m  o;  tao 


lisa  is  we  have  eh,.t  k^lcm^r 


and  the  order  of  calculation  in  the  worst  case  -s  m-lcmm  inter¬ 


changes  and  2m. -2  Icg^m  comparisons , 
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Since  we  have  a  he^p,  the  top  element  is  the  smallest  ele- 
mont  of  the  list.  We  new  enter  the  second  phase  of  Tree  sort . 
The  top  element  is  removed  from  the  heap  and  saved.  The  last 
f ini to  element  of  the  tree  is  then  put  at  the  top  and  sift-up 
ib  carried  out  until  it  finds  its  proper  level.  The  newest 
.top  element  (the  second  smallest  element  of  the  original  l-.st) 
is  removed  and  the  last  finite  element  is  brought  to  the  top. 
The  number  of  interchanges  in  the  worst  case  is  h-1  if  the  tree 
has  is  levels  remaining  In  general,  there  are  2  ”  sift-ups 
carried  out  on  X  level  trees.  Thus  there  are  approximately 
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